Semiparametric Principal Component Analysis
نویسندگان
چکیده
We propose two new principal component analysis methods in this paper utilizing a semiparametric model. The according methods are named Copula Component Analysis (COCA) and Copula PCA. The semiparametric model assumes that, after unspecified marginally monotone transformations, the distributions are multivariate Gaussian. The COCA and Copula PCA accordingly estimate the leading eigenvectors of the correlation and covariance matrices of the latent Gaussian distribution. The robust nonparametric rank-based correlation coefficient estimator, Spearman’s rho, is exploited in estimation. We prove that, under suitable conditions, although the marginal distributions can be arbitrarily continuous, the COCA and Copula PCA estimators obtain fast estimation rates and are feature selection consistent in the setting where the dimension is nearly exponentially large relative to the sample size. Careful numerical experiments on the synthetic and real data are conducted to back up the theoretical results. We also discuss the relationship with the transelliptical component analysis proposed by Han and Liu (2012).
منابع مشابه
Nonlinear Multidimensional Data Projection and Visualisation
Multidimensional data projection and visualisation are becoming increasingly important and have found wide applications in many fields such as decision support, bioinformatics and web/document organisation. Various methods and algorithms have been proposed as either nonparametric or semiparametric approaches. This paper provides an overview of the subject and reviews some recent developments. R...
متن کاملPrincipal Component Analysis on non-Gaussian Dependent Data
In this paper, we analyze the performance of a semiparametric principal component analysis named Copula Component Analysis (COCA) (Han & Liu, 2012) when the data are dependent. The semiparametric model assumes that, after unspecified marginally monotone transformations, the distributions are multivariate Gaussian. We study the scenario where the observations are drawn from non-i.i.d. processes ...
متن کاملSemiparametric principal component poisson regression on clustered data
In modelling count data with multivariate predictors, we often encounter problems with clustering of observations and interdependency of predictors. We propose to use principal components of predictors to mitigate the multicollinearity problem and to abate information losses due to dimension reduction, a semiparametric link between the count dependent variable and the principal components is po...
متن کاملRobust Sparse Principal Component Regression under the High Dimensional Elliptical Model
In this paper we focus on the principal component regression and its application to high dimension non-Gaussian data. The major contributions are two folds. First, in low dimensions and under the Gaussian model, by borrowing the strength from recent development in minimax optimal principal component estimation, we first time sharply characterize the potential advantage of classical principal co...
متن کاملOn Estimating the Mixed Effects Model
This paper introduces a new estimation method for time-varying individual effects in a panel data model. An important application is the estimation of time-varying technical inefficiencies of individual firms using the fixed effects model. Most models of the stochastic frontier production function require rather strong assumptions about the distribution of technical inefficiency (e.g., half-nor...
متن کامل